Efficient decomposition-based multiclass and multilabel classification

نویسنده

  • Sang-Hyeun Park
چکیده

Decomposition-based methods are widely used for multiclass and multilabel classification. These approaches transform or reduce the original task to a set of smaller possibly simpler problems and allow thereby often to utilize many established learning algorithms, which are not amenable to the original task. Even for directly applicable learning algorithms, the combination with a decomposition-scheme may outperform the direct approach, e.g., if the resulting subproblems are simpler (in the sense of learnability). This thesis addresses mainly the efficiency of decomposition-based methods and provides several contributions improving the scalability with respect to the number of classes or labels, number of classifiers and number of instances. Initially, we present two approaches improving the efficiency of the training phase of multiclass classification. The first of them shows that by minimizing redundant learning processes, which can occur in decomposition-based approaches for multiclass problems, the number of operations in the training phase can be significantly reduced. The second approach is tailored to Näıve Bayes as base learner. By a tight coupling of Näıve Bayes and arbitrary decompositions, it allows an even higher reduction of the training complexity with respect to the number of classifiers. Moreover, an approach improving the efficiency of the testing phase is also presented. It is capable of reducing testing effort with respect to the number of classes independently of the base learner. Furthermore, efficient decomposition-based methods for multilabel classification are also addressed in this thesis. Besides proposing an efficient prediction method, an approach rebalancing predictive performance, time and memory complexity is presented. Aside from the efficiency-focused methods, this thesis contains also a study about a special case of the multilabel classification setting, which is elaborated, formalized and tackled by a prototypical decomposition-based approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Text Categorization Using a Min-Max Modular Support Vector Machine

The min-max modular support vector machine (M-SVM) has been proposed for solving large-scale and complex multiclass classification problems. In this paper, we apply the M-SVM to multilabel text categorization and introduce two task decomposition strategies into M-SVMs. A multilabel classification task can be split up into a set of two-class classification tasks. These two-class tasks are to dis...

متن کامل

Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification

Top-k error is currently a popular performance measure on large scale image classification benchmarks such as ImageNet and Places. Despite its wide acceptance, our understanding of this metric is limited as most of the previous research is focused on its special case, the top-1 error. In this work, we explore two directions that shed light on the top-k error. First, we provide an in-depth analy...

متن کامل

Label Filters for Large Scale Multilabel Classification

When assigning labels to a test instance, most multilabel and multiclass classifiers systematically evaluate every single label to decide whether it is relevant or not. This linear scan over labels becomes prohibitive when the number of labels is very large. To alleviate this problem we propose a two step approach where computationally efficient label filters pre-select a small set of candidate...

متن کامل

Scalable Multilabel Prediction via Randomized Methods

Modeling the dependence between outputs is a fundamental challenge in multilabel classification. In this work we show that a generic regularized nonlinearity mapping independent predictions to joint predictions is sufficient to achieve state-of-the-art performance on a variety of benchmark problems. Crucially, we compute the joint predictions without ever obtaining any independent predictions, ...

متن کامل

A multiclass/multilabel document categorization system: Combining multiple classifiers in a reduced dimension

This article presents a multiclassifier approach for multiclass/multilabel document categorization problems. For the categorization process, we use a reduced vector representation obtained by SVD for training and testing documents, and a set of k-NN classifiers to predict the category of test documents; each k-NN classifier uses a reduced database subsampled from the original training database....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012